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Abstract 

The overall goal of our study was to compare the proteins found in the saliva proteomes of three 
mammals: human, mouse and rat. Our first objective was to compare two human proteomes with 
very different analysis depths. The 89 shared proteins in this comparison apparently represent a 
core of highly-expressed human salivary proteins. Of the proteins unique to each proteome, one- 
half to 2/3 lack signal peptides and probably are contaminants instead of less highly-represented 
salivary proteins. We recently published the first rodent saliva proteomes with salivas collected 
from the genome mouse (C57BL/6) and the genome rat (BN/SsNHsd/Mcwi). Our second 
objective was to compare the proteins in the human proteome with those we identified in the 
genome mouse and rat to determine those common to all three mammals as well as the specialized 
rodent subset. We also identified proteins unique to each of the three mammals because 
differences in the secreted protein constitutions can provide clues to differences in the 
evolutionary adaptation of the secretions in the three different mammals. 
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1. Introduction 

The advent of genomic and proteomic sciences has provided a flood of new information 
about genes expressed to produce the array of proteins characteristic of a particular tissue. 
Determining which genes are expressed in a particular type of cell and/or in the fluid it 
secretes can be done by assaying either RNA transcripts, a translated protein product or, 
ideally, both. Mammals including primates and rodents produce and secrete proteins into 
saliva from three major salivary glands: the parotid, sublingual and submandibular glands, 
as well as other minor sources (e.g. tongue). 
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Salivary glands produce the proteins necessary to initiate digestion, to lubricate the hard and 
soft tissues of the mouth and to protect against infection. Primary salivary gland malfunction 
can occur due to viral or bacterial infection, autoimmune disease (e.g. Sjogren's 
syndrome [1]), calcium stone formation which blocks secretion, or tumor development 
and/or invasion. Medications and radiation treatment can also inhibit salivary gland 
function. Decrease in saliva production leads to breakdown of teeth and the other oral cavity 
structures and much attention is focused on maintaining appropriate salivary gland function. 

We previously obtained saliva proteomes of the genome mouse (C57BL/6) and the genome 
rat (BN/SsNHsd/Mcwi) using multidimensional protein identification technology 
(MUDPIT) for the purpose of studying rapidly evolving proteins and their genes [2]. That 
publication focused on the independent expansions of the mouse and rat kallikrein 
subfamilies expressed in saliva and how selection influenced their evolution. 

The overall goal of the project we report here was to compare the proteins found in the 
saliva proteomes of three mammals: human, mouse and rat in order to identify proteins 
shared and unique to one or more taxa. We selected two different human saliva proteomes to 
compare and contrast with our rodent saliva proteomes[2]. One human saliva proteome[3] 
was produced from whole saliva and analyzed at a depth similar to the rat and mouse 
proteomes we produced, while the second[4] reported a far more extensive human saliva 
proteome from salivary gland duct secretions collected by three different groups 
participating in a consortium. Because these two human proteomes differ both in collection 
and analysis techniques, our first objective was to compare the identifications made by the 
two studies. Our questions are: 

1. Which proteins are shared between the two human saliva proteomes and which are 
not? 

2. Does a deeper proteome necessarily improve the protein representation of salivary 
gland secretions? 

3. Does using saliva collected from individual salivary gland ducts, rather than whole 
saliva, improve prove the representation of salivary gland secretions in the final 
analysis? 

The major advantage of proteomes is that proteins identified at a high probability from two 
or more high quality peptides can be confidently believed to be present in the protein 
mixture analyzed. However, in secretions such as saliva or tears, one cannot conclude that 
every identified protein was secreted by the gland(s) producing that fluid. Proteins found in 
saliva are primarily secreted by salivary glands, but can also result from contamination from 
other sources (e.g. tracheal, naso-pharyngeal) or from cellular breakdown. We used the 
presence of a signal peptide as a surrogate for extracellular secretion [5] in order to eliminate 
from further consideration contaminating proteins most likely produced by cellular 
breakdown. 

The mouse and rat are widely used as experimental organisms in studies of human 
pathological conditions and so it is important to understand the ways in which their 
physiologies are comparable to human physiology and the ways in which they are not. 
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Moreover, differences in the secreted salivary proteins can provide clues to differences in 
the evolutionary adaptation of the secretions in the three different mammals. Thus, our 
second objective was to determine which salivary proteins are shared among the three 
mammal proteomes and which are unique to one of them or shared by only two of them. For 
this objective, our questions were: 

1. What proteins are shared by the human, mouse and rat saliva proteomes, and which 
are shared by two of the three proteomes? 

2. Are the proteins shared between two or all three mammal proteomes encoded by 
genes with known evolutionary relationships, that is to say that they are 
orthologous or paralogus; or is their apparent similarity an accident of naming that 
does not represent a true evolutionary relationship? 

3. What proteins are unique to the saliva proteomes of each of the three mammals? 

It was our ultimate goal to determine whether the proteins that appeared to be similar in two 
or more mammal saliva proteomes actually shared an evolutionary history, i.e. were 
orthologous/paralogous, or whether the similarity was superficial and they do not share an 
evolutionary history. Superficial similarities can arise when characteristics that may have 
occurred as the result of convergent evolution (e.g. a high representation of an amino acid 
such as proline) result in similar naming, but where a shared evolutionary history is lacking 
in the two taxa under scrutiny. In discussing potentially shared evolutionary histories, we 
tried to take into consideration the similarities and differences in rodent and human 
nutritional physiology and behavior. 

2. Experimental 

2.1. Protein identification from proteomic data 

The materials and LC-MS/MS methods were reported previously for the human[3,4], mouse 
(C57BL/6) and rat (BN/SsNHsd/Mcwi)[2] saliva proteomes. The information on rat Klkl 
gene subfamily expression in the Sprague-Dawley strain can also be found in[2]. 

The spectra from the two human studies were identified by searching against two different 
databases, the human-only entries in the Swiss-Prot (Swiss-Prot, Release 42.0, October 
2003)[3] and the European Bioinformatics Institute (EBI) human International Protein Index 
(IPI) database (version 3.01; release date November 1, 2004)[4]. To compare these 
identifications, we first converted the two sets of data to the Uniprot format, and this was 
especially important in view of the deactivation of the IPI database. We used the UniProt ID 
Mapping function to batch convert IPI numbers (www.uniprot.org). Some IPI numbers 
could not be converted to UniProt in that way, thus we used the NCBI protein search 
function to convert the remaining IPI numbers (http://www.ncbi.nlm.nih.gov/). One hundred 
and eighty-eight proteins from[4] were not successfully converted from IPI to UniProt 
Accession numbers and these were eliminated from further analysis. Furthermore, some 
proteins have several IPI numbers that convert to the same UniProt number and there are 
also proteins with one IPI number that correspond to multiple UniProt numbers. In those 
cases, we evaluated each protein number and retained only the validated or most recently 
reviewed UniProt number. See Fig. 1 for a summary of this and downstream processes. 
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2.2. Sorting shared and non-shared human salivary proteins 

Microsoft Access (http://office.microsoft.com/en-us/access/) was used to compare the 
proteins identified in human proteomes[3,4] by designing queries to search for shared 
UniProt Accession numbers in both proteomes, and to search for UniProt numbers unique to 
each proteome (Fig. 1). To identify unique proteins in[4], the UniProt Accession numbers 
were searched against those found in[3] using "Is Null" criteria. This query was rerun 
comparing the[3] proteome against the[4] proteome to produce proteins unique to [3]. 

2.3. Identifying secreted and non-secreted proteins in the saliva proteomes 

SignalP (www.cbs.dtu.dk/services/SignalP/;[6]) was used to predict the presence or absence 
of a signal-peptide cleavage site for each protein to help determine whether or not that 
protein will be processed for secretion (Fig. 1). Proteins with a D score greater than 0.45 
were predicted to have a signal peptide and signal-peptide cleavage site, designating them as 
putative secreted proteins. Proteins with a D score below 0.45 5 were categorized as lacking 
a signal peptide. 

2.4. Identifying similar proteins 

We grouped the shared human proteins with the most similar rodent proteins by UniProt ID 
and then tested for orthology and paralogy of their genes. Orthologies between human, 
mouse, and rat were computed using the "orthology" feature on www.genome.ucsc.edu, 
which identifies the best BLASTP hit and filters out non-syntenic hits [7]. For unclear 
protein identities, the Genome Browser Convert utility was used to locate the position of a 
gene in the genome assembly of other species [7]. During the conversion process, portions 
of the genome in the coordinate range of the original assembly are aligned to the new 
assembly while preserving their order and orientation. We double-checked all proteins found 
only in two of three taxa against the other taxon by identifying the ortholog's UniProt 
number with BLASTP, and manually searching the appropriate proteome for that protein. 

3. Results s and Discussion 

3.1. Comparing and contrasting the proteins identified in two human saliva proteomes 

We chose two human saliva proteomes of very different depths to compare and contrast. 
One study collected whole saliva from a single adult male and separated peptides with two- 
dimensional chromatography linked to mass spectrometry[3]. The second study was far 
more extensive, involving three different institutions in a consortium that produced a deeper 
proteome[4]. In that study, salivas were collected from subjects of both sexes using 
collection devices designed for each duct. The peptides were separated by a number of 
different methods before LC-MS/MS analysis of the peptide mixtures was performed. We 
wished to determine how the results from these two very different human saliva proteome 
studies compared and contrasted. 
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3.2 Which proteins are shared between the two human saliva proteomes and which are 
not? 

Questions 1 and 2 that we posed for the first objective in the Introduction concerned 1) 
Which proteins are shared between the two human saliva proteomes and which are not?; and 
2) Does a deeper proteome necessarily improve the protein representation of salivary gland 
secretions? Nearly all of the proteins identified in the shallower proteome (89/101; 88%) 
were also found by the consortium project (SF1). Figure 1 shows the sorting flow chart. 
Subsequent SignalP analysis showed that 66% of the shared proteins (59/89) have signal 
peptides and 34% (30/89) do not. Nearly 2/3 of the proteins uniquely identified by the 
consortium[4], (569/885; SF2) lack signal peptides, as do 6/12 (50%) of proteins unique 
to[3](SF3). We interpret these findings to mean that the shared proteins in this comparison 
represent a core of highly-expressed human salivary proteins, while those unique to a 
proteome are at least as likely[3] to twice as likely[4] to be contamination from intracellular 
and other sources. It probably should be expected that a deeper proteome may reveal less 
highly-represented proteins, but at the expense of detecting more contaminating proteins. 

In Question 3 of the first objective, we asked whether using saliva collected from individual 
salivary gland ducts, rather than whole saliva, improved the representation of salivary gland 
secretions in the final analysis. Given that both human saliva proteomes agreed on most of 
the proteins identified in the shallower one and that 2/3 of the residual proteins in the deeper 
proteome lack signal peptides and are likely to be contaminants, we cannot conclude that 
one collection method was clearly superior to the other. It is probably safer to conclude that 
the different depths of analysis were more important than the sample collection methods. 

3.3. Proteins shared by mouse, rat and human saliva proteomes 

Our second objective was to compare the salivary proteins from the proteomes of the three 
mammals: human, mouse and rat, to determine the subset shared by all three mammal 
salivas, those shared by only two of the three (Table 1) and those that are unique to each of 
the three mammal salivas. Genes that are derived by speciation have been defined as 
orthologs and clearly share a common descent, whereas genes that evolved through 
duplication are called paralogs ([8,9] and reviewed in[10]). While it is clear that paralogs 
share an evolutionary history, they lack the direct 1 : 1 relationship of orthologs and may have 
different origins in different species. The third possibility is that evolutionarily unrelated 
proteins may share a common name. This is the null hypothesis against which we are testing 
potentially-related proteins. 

To begin the comparison, we separated the human saliva proteome into those proteins 
shared between tween the two studies[3,4], those unique to just one. We first grouped the 
shared human proteins with the most similar rodent proteins by UniProt ID and then tested 
for orthology and paralogy of their genes. The residual proteins in the rodent proteomes 
were then compared to the unique proteins of[3] and[4] and their genes tested for orthology 
or paralogy. Those proteins left unmatched by these tests were considered to be the sets 
unique to each of the three mammals. 
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Some proteins such as carbonic anhydrase, kallikrein-1 and nucleobindin are clearly 
orthologs in all three mammals (Table 1). In other instances, such as alpha-amylase, two of 
the three mammals (mouse and rat) have orthologous genes, while the human gene is 
paralogous. Nonetheless they clearly share evolutionary histories as indicated by the fact 
that they are all located in chromosomal regions that are homologous in the three taxa. Other 
proteins are structurally related but non-orthologous. For example, five human proline -rich 
proteins (PRPs) shared chromosomal region homology with two mouse and five rat PRPs, 
while three human proline-rich proteins including statherin have no corresponding proteins 
in the rodents. 

This part of the study allows us to comment further on the effect of proteome depth of 
protein detection. Figure 1 shows that the number of proteins in the pool shared by the 
human saliva proteome with one or the other rodent proteomes was augmented more than 
50% from the [4]-unique collection, but only 10% from the [3]-unique collection. While this 
supports the idea that a deeper proteome provides an advantage over a shallower one, we 
also note that a very large number of the residual proteins in the [4] -unique pool appear to be 
contaminants as shown by our SignalP analysis. Of additional concern is that the shallower 
of the two human proteomes found all five members of the Ig secretory complex[3], while 
the deeper proteome missed the Ig lambda light-chain (Q6GMV7) and Ig alpha-2 chain c 
region (P01877). Thus a deeper proteome clearly confers an overall advantage in protein 
representation, but this may not be true for all proteins. 

3.4 Proteins unique to rodent saliva 

Clearly the three mammals share a core of proteins that play important roles in the early 
stages of digestion, in protecting and lubricating hard and soft surfaces and in 
immunological protection and maintenance of the oral cavity generally. Given the many 
decades of research on individual proteins playing these roles, this is hardly surprising. 
Perhaps more intriguing are the proteins shared by mouse and rat but absent from humans, 
especially since the mouse and rat are widely used as experimental organisms in studies of 
human pathological conditions and rodent-specific proteins may limit applicability of these 
models. The rodent-shared protein group (Table IB) is 25% as large (7) as the core shared 
between human and one or both rodents (28; Table 1 A). Four of the seven rodent-unique 
proteins are clearly orthologous while half of the proteins shared between humans and 
rodents include complex paralog/ortholog sets, reflecting more complex evolutionary 
histories. 

The mouse and rat secrete chitinase, common salivary protein, deoxyribonuclease, odorant 
binding protein, ovostatin, proline-rich lacrimal 1 protein and submandibular gland protein 
into their salivas that humans do not. Other studies have shown that both rodents are capable 
of expressing an impressive array of kallikreins from subfamilies that are unique to each 
genome[l 1] (see below). 

These important differences in secreted salivary proteins may provide clues to differences in 
the evolutionary adaptation of the secretions in the three different mammals. For example, it 
is possible that chitinase and deoxyribonuclease in rodent saliva provide the potential for 
digesting food sources more available to rodents than to humans. We also note that some of 
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the proteins unique to rodent saliva proteomes may play a primary or secondary role in 
grooming and pelage maintenance. Humans are one of the few mammals without a pelage of 
fur or wool covering nearly the entire body and thus the potential roles of proteins involved 
in grooming and pelage maintenance are not included in most human-centric discussions of 
saliva constitution. For example, we have previously shown that mice coat their pelts with 
salivary androgen-binding protein (ABP;[12]) and we suggested that this was a means of 
advertising the subspecies of the animal since ABP has been implicated in mediating 
subspecies identification[12-15]. A general role in coating surfaces was later proposed for 
secretoglobins such as ABP by Dominguez[16] following the first report of substantial 
identities among rabbit uteroglobin, cat Fel dl and mouse ABP by Karn[17]. One can 
envision that a surface coating might include a chitinase that could defend against 
ectoparasites by attacking their exoskeletons. 

The presence of the unique array of salivary kallikreins in rodent salivas is a knotty problem, 
given that, at least in mouse saliva, they show extensive sex-limited expression. Rodent 
species including the house mouse (Mus musculus) and some strains of rats {Rattus 
norvegicus) show impressive elaboration of a specific tissue of the submandibular gland, the 
granulated convoluted tubular (GCT) tissue, often only in males following puberty[18]. This 
sex-limited tissue differentiation causes the submandibular glands with elaborated GCT to 
produce kallikrein serine proteases encoded in Klkl gene subfamilies that have recently 
expanded independently in house mice and rats[ll]. This results in a clear sex-limited 
expression of all but a few of these Klklb subfamily kallikrein genes in male mice but the 
picture is not so clear in rats [2]. The two strains of rat that have been studied to date show a 
very different expression of their Klklc subfamily kallikrein genes, with the genome rat not 
expressing any of them while the Sprague-Dawley rat expresses the Klklc kallikrein genes 
in both sexes. Unfortunately, neither human saliva proteome project[3,4] addressed the issue 
of differential expression of proteins in males and females. Thus we cannot currently assess 
the contribution of sex-limited expression to the complement of proteins found by[4] that 
were not found by[3]. 

3.5. Proteins unique to each saliva proteome 

Removing the salivary proteins shared by two or three of the mammal proteomes allowed 
identification of the proteins unique to each of them (File SF4-SF5). The human saliva 
proteome contains a number of salivary proteins that distinguish it from the rodent 
proteomes, including the statherin-like PRPs, the histatins, zinc alpha glycoprotein and the 
Ig saliva secretory complex. Statherin prevents calcium phosphate precipitation in saliva, 
thus allowing calcium to be maintained at a supersaturated level in saliva to prevent 
deterioration of the teeth[19]. In addition to the physical shielding properties of the epithelial 
layer and mucin, components of innate immunity including lysozyme, lactoferrin and 
cystatins likely cooperate with adaptive humoral immunity mediated by antibodies in the Ig 
secretory complex to fight infection in the human oral cavity [20]. The presence of lysozyme 
and the Ig secretory complex in human but not in rodent saliva suggests that humans have 
more need of such weapons against infection. The remaining proteins appear to have an 
assortment of unrelated functions. Strikingly, the addition of the proteins unique to[3] and 
to[4] that have signal peptides brought the human list to 344. A brief survey of these 
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proteins produced descriptions such as: uncharacterized protein, protein existence uncertain, 
tissue specificity - epidermis, protein existence inferred from homology, and subcellular 
location - lysosome. In other words, the majority of these protein identifications seem to 
make up a highly heterogeneous collection of proteins and we suspect that many of them are 
contaminants in spite of having signal peptides. 

Of the 22 unique mouse salivary proteins, 2/3 consist of eleven Klkl b-encoded subfamily 
kallikreins and three androgen-binding subunit proteins (total of 14), none of which have 
human equivalents. The Klklb subfamily kallikreins are expressed almost exclusively in 
males and we have suggested, on the basis of new data, that the previous speculative 
function of the species-specific rodent kallikreins as important solely in wound healing in 
males be investigated further. In addition to or instead of that function, we proposed that 
their sex-limited expression, coupled with their rapid evolution may be clues to an as-yet- 
undetermined interaction between the sexes [2]. The three androgen-binding protein (ABP) 
subunit proteins, which form dimers to produce mouse pheromones (reviewed in[21]), are 
found in both sexes of mice and have been proposed to be involved in incipient 
reinforcement where subspecies of mice make secondary contact[15]. Mice also secrete 
trypsinogen, a peptidase inhibitor, MUP5, EGF binding protein, vomeromodulin, a 
glycoprotein and two poorly characterized proteins. 

The genome rat saliva proteome has only three unique proteins: contiguous repeat 
polypeptide, an alpha-2 microglobulin distinct from the shared version, and an 
uncharacterized protein with similarity to GRPCB. Although, as we noted above, the saliva 
of another rat strain also contains numerous rat-specific kallikreins. Thus the question of 
whether expression of species-specific kallikrein family genes is shared between the two 
rodents or unique to mice depends on the strain of rat in the comparison. 

4. Conclusions 

Much work has been done on individual salivary proteins in humans and other animals over 
the past five decades and there are relatively recent research papers and reviews that have 
focused on the human salivary proteins (e.g. PRPs[22]; a human saliva glycoprotein 
proteome [23]; a human proteomic study from a consortium of institutions [4]; and a 2013 
review of human salivary proteins [24]). Less has been done with rodent salivary proteins. 
We published the results of the application of multidimensional protein identification 
technology (MUDPIT), an LC-LC -MS/MS analysis, to stimulated mouse and rat saliva for 
the purpose of studying rapidly evolving proteins and their genes [2]. 

It is possible that the comparison and contrast of the salivary protein components of human 
and rodent salivas that we have presented here has raised more questions than it has 
provided insights. Given that there has been no previous such study, however, we hope that 
at least we have framed some important questions, especially evolutionary ones for us and 
for others to pursue. Our one conclusion that we feel will be useful for future studies 
involving one or the other rodent as a model for human oral physiology is that there are 
significant differences in the protein constituents between the salivas of humans and rodents 
and that these could be misleading if not taken into consideration. 
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38 human salivary proteins with orthokjgs or 
paralogs expressed in mouse and/or rat saliva 



Figure 1. 

Flow chart for comparing the two human proteomes (steps 1, 2 and 3), and the human with 
rodent saliva proteomes. Step 1: the IPI accession numbers of proteome [4] were converted 
to UniProt accession numbers; Step 2: proteins in the two proteomes were sorted by their 
UniProt numbers; Step 3: proteins were grouped by signal peptide status. a Duplicated 
UniProt numbers were removed. 
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